Autophagy characteristics and establishment of autophagy prognostic models in lung adenocarcinoma and lung squamous cell carcinoma

Background Non-small cell lung cancer (NSCLC), which makes up the majority of lung cancers, remains one of the deadliest malignancies in the world. It has a poor prognosis due to its late detection and lack of response to chemoradiaiton. Therefore, it is urgent to find a new prognostic marker. Methods We evaluated biological function and immune cell infiltration in lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) patients from TCGA and GEO databases between different clusters based on autophagy related hub genes. Autophagy scores were used to assess the degree of autophagy in each individual by using component analysis. Results Three different clusters were obtained. Gene set variation analysis, single-sample gene set enrichment analysis and survive analysis showed differences among these three clusters. We demonstrated that the autophagy score of each patient could predict tumor stage and prognosis. Patients with a high autophagy score had a better prognosis, higher immune infiltration, and were more sensitive to immunotherapy and conventional chemotherapy. Conclusion It was uncovered that autophagy played an irreplaceable role in NSCLC. Quantified autophagy scores for each NSCLC patient would help guide effective treatment strategies.


Introduction
Lung cancer is still one of the most difficult problems, and it is one of the leading causes of cancer death worldwide [1]. Histopathologically, lung cancer is subdivided into small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). Among them, NSCLC accounts for 85% of all lung cancer cases, and it is mainly composed of lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD) [2]. With the research and exploration of related technologies, NSCLC has made undeniable progress in surgical treatment, chemotherapy and immunotherapy. However, the 5-year survival rate of NSCLC is still poor in China. Mainly due to the lack of early diagnosis and classification treatment. Therefore, the only way to alleviate this situation is to find a new prognostic assessment method to guide individualized treatment of NSCLC patients [3].
Studies have shown that autophagy is associated with the occurrence and progression of lung cancer [4]. There are three types of autophagy: macroautophagy, microautophagy and chaperone-mediated autophagy. Macroautophagy is the main form of autophagy, which is what we usually refer to as the type of autophagy. This is a highly conserved catabolic process that degrades damaged or dysfunctional proteins or organelles through lysosomes [5,6]. The target substrate is first separated by the phagophore (a unique membrane structure) into a double-membrane autophagosome. Then, the autophagosome sends the cargo for degradation through fusing with endosomes and lysosomal vesicles to mature into amphisomes and autolysosomes. Finally, the degradation products will be recycled for cell synthesis biological reactions. Macroautophagy relies on the production of autophagosomes. On the contrary, during microautophagy, the lysosome use its own membrane to swallow substrates and degrade them. In chaperone-mediated autophagy, after the target protein with a specific sequence is recognized by the chaperone, it binds to the special receptor on the lysosomal membrane-the lysosomeassociated membrane glycoprotein type 2A (Lamp2A), and enters the lysosome to be degraded. However, there are pros and cons to autophagy. In the initial stage of cancer, it can swallow cancer cells and protect normal cells from tumor invasion. In the advanced stage of cancer, it can also provide energy and nutrition for cancer cells to promote their growth [7]. Related research found that high expression of autophagy-related gene 10 was associated with poor prognosis of lung cancer. SKI like proto-oncogene promotes tumorigenesis and immune escape of NSCLC by up-regulating the Tafazzin, Phospholipid-Lysophospholipid Transacylase/autophagy axis. Casein Kinase 1 Alpha 1 inhibits lung tumor growth by stabilizing PTEN and inducing autophagy [8][9][10]. These studies had proved that autophagy was involved in lung cancer, and suggested that autophagy-related genes might be considered as prognostic markers in lung cancer.
In this study, we explored the relationship between autophagy-related genes (ARGs) and the prognosis of LUAD and LUSC, and established an autophagy scoring system to evaluate patients. Based on the score, we could predict the prognosis of patients and guide patients to make the best treatment choice.

Data acquisition and pre-processing
The gene expression profiles and associated clinical information of lung cancer were available from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/) on April 11th, 2021. The database includes clinical data of various human cancers (including tumor subtypes), which is an important data source for cancer researchers. The download gene expression profiles met the following conditions: (1) the primary site was "bronchus and lung"; (2) the program was "TCGA"; (3) the disease type were "adenomas and adenocarcinomas" and "squamous cell neoplasms"; (4) the data category was "transcriptome profiling"; (5) the data type was "Gene Expression Quantification"; (6) the workflow type was "HTSeq-FPKM". The TCGA samples include 103 normal samples and 999 tumor samples, of which 502 lung squamous carcinoma samples and 497 lung adenocarcinoma samples are among the tumor samples. In addition, we downloaded series matrix files and platform files of three datasets (GSE41271, GSE42127, GSE37745) from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/), a database for storing chips, nextgeneration sequencing, and other high-throughput sequencing data. Patients diagnosed with LUAD or LUSC were selected. The basic information of the three datasets is provided in Table 1 [11].
We also download somatic mutation data of LUAD and LUSC from TCGA database. The data were acquired according to the following conditions: (1) the data category was "simple nucleotide variation"; (2) the data type was "Masked Somatic Mutation"; (3) the workflow type was "VaScan2 Variant Aggregation and Masking". Meanwhile the copy number (gene-level) data of GDC TCGA-LUAD and TCGA-LUSC were got from UCSC Xena (http://xena.ucsc. edu/) for following Copy Number Variation (CNV) analysis [12].
Gene expression profiles from TCGA were converted to the matrix of standard human gene expression of each sample. We took the average of the expression of the same gene. Then the FPKM (Fragments Per Kilobase Million) data were converted into the TPM (Transcripts Per Million) data [13]. The two steps were using the R package "limma". At the same time, we extracted clinical information files (including id, age, gender, survival status, overall survival time (OS), stage and pathology) and Gene expression file (including id and probe name) from the two series of matrix files downloaded from GEO database respectively. According to the platform file, we use Perl to batch convert the probe names in the gene expression files into standard human gene names. The batch effect caused by non-biotechnology deviations in these three datasets was rectified by using the "combat" function in the R package "SVA" [14]. If the pathology of the sample was normal or the sample lacking any clinical information such as sample id, age, gender, survival status, overall survival time (OS) and pathology would be deleted in the clinical information file. The merger cohort consisting of TCGA-LUAD, TCGA-LUSC, GSE41271, and GSE42127 was used as the training set, and GSE37745 was used as the validation set. R (version 4.0.4) and strawberry-perl (version 5.30.2.1) were used to process data.

PPI network construction and acquirement of autophagy-related hub genes (ARHGs)
Autophagy-related genes (ARGs) were obtained from Human Autophagy Database (http:// www.autophagy.lu/index.html). A PPI network of ARGs was constructed by using the STRING database (https://string-db.org/), where minimum required interaction score was set to 0.95 and free nodes were deleted. Then the network was visualized by using Cytoscape software (version 3.7.2), in which the CytoNCA plug-in (version2.1.6) was used to calculate scores of each node's BC (Betweenness), CC (Closeness), DC (Degree), EC (Eigenvector), LAC (Local Average Connectivity-based method) and NC (Network). Only the nodes that were higher than the median value of each score at the same time would be retained, and node genes were called ARHGs [15].

Unsupervised clustering for ARHGs
Intersection ARHGs were extracted from three integrated datasets. We applied unsupervised clustering analysis to classify the samples based on the expression of the intersection of ARHGs. We used the R package "ConsensusClusterPlus" to repeat the above steps 1000 times for guaranteeing the stability of classification. The number of clusters and stability was determined by the clustering algorithm [16].

Gene set variation analysis (GSVA)
GSVA, an unsupervised gene set enrichment method which was used to estimate the differences in the activity of different pathways between high and low score groups. In order to reveal the differences in biological pathways between different clusters based on ARHGs, we downloaded the "c2.cp.kegg.v7.4.symbols" from Gene Set Enrichment Analysis (http://www. gsea-msigdb.org/gsea/index.jsp). The R package "GSVA" was used to calculate the score of separate pathways in each sample, where the parameter of "min.sz" was set to 10, "max.sz" was set to 500 and "parallel.sz" was set to 1. Subsequently, differential analysis of pathways was carried out by using the R package "limma". The pathway adjusted P < 0.05 was regarded as having differentially expressed pathway [17,18].

Immune infiltration analysis
We downloaded the gene set for marking each TME infiltrating immune cell type which was based on Charoentong's research on immunity from "The Cancer Immunome Atlas" (http:// icbi.at/TCIA/SupplementaryTables.pdf). Single-sample gene set enrichment analysis (ssGSEA) method was carried out to calculate the relative abundance of each immune cell infiltration in each sample in the TME (tumor microenvironment) of lung cancer by R package "GSVA" [19].

Screening of differential expressed genes (DEGs) between ARHG-clusters
In order to discover DEGs related to autophagy, R package "limma" and "VennDiagram" were used to identify DEGs between ARHG-clusters, of which DEGs whose corrected p-value was less than 0.001 were retained [20].

Function and pathways analysis of DEGs
GO (Gene Ontology) and KEGG (Kyoto Encyclopedia of Genes and Genomes) were conducted to analyse the function and pathways of DEGs by utilizing the R packages "org.Hs.eg. db". The filter condition was that P-value and q-value were both less than 0.05. According to the number of gene enrichment, sorting from largest to smallest, we took the top 30 KEGG pathways and the top 30 GOs terms including the top 10 BP (biological process) terms, 10 CC (cellular component) terms and 10 MF (molecular function) terms [21].

Establishment of ATscore(autophagy score) System
In order to evaluate the degree of autophagy in each lung cancer patient, we constructed a set of autophagy scoring system through autophagy genes related to prognosis which we called ATscore. This system was set up through the following steps: According to the patient's survival status and survival time, univariate Cox analysis of DEGs was performed to find genes related to prognosis and only p-values<0.05 were regarded as significant [22]. Patients were structured in several groups for further analysis by using an unsupervised clustering method for analyzing DEGs related to prognosis. The parameters were similar to ARHGs. The cluster was named geneCluster. At the same time, we analyzed the differences expression of ARHGs between geneClusters.
Then we used the PCA (principal component analysis) method to construct a scoring system. Components 1 and 2 would be chosen to form this scoring system [23]. We used a method similar to GGI [24] to define the ATscore of each patient: Where i represented the expression of DEGs related to prognosis.

Correlation between ATscore and clinical traits
In order to know the relationship between ATscore and clinical characters, we analyzed ATscore with each patient's gender, age, stage, pathology, and survival status, and verified the relationship between ATscore and survival among different independent clinical traits. Meanwhile we analyzed patients with the same clinical traits separately to eliminate the interference of clinical traits on the results. Univariate Cox analysis and multivariate cox analysis were used to explore the relationship between ATscore and survival.

Treatment prediction of LUAD and LUSC based on ATscore
To know the immunogenicity of patients with high and low ATscore in LUAD and LUSC. We obtained the immunophenotypic score (IPS) of TCGA-LUAD and TCGA-LUSC patients from The Cancer Immunome Database (TCIA, https://tcia.at/home). Then the differences of IPS between high ATscore group and low ATscore group in LUAD and LUSC were analyzed [25]. We use R package "pRRophetic" to predict the half-maximal inhibitory concentration (IC50) of five common chemotherapy drugs (cisplatin, gemcitabine, paclitaxel, vinorelbine, and methotrexate) used to treat LUAD and LUSC [26]. The prediction of sensitivity of LUAD and LUSC patients to these five chemotherapy drugs was based on Genomics of Drug Sensitivity in Cancer (GDSC, https://www.cancerrxgene.org/).
To further verify the accuracy of our model, we calculated the ATscore of each patient in GSE37745 according to the previous methods (unsupervised cluster analysis based on ARHGs, acquisition of prognostic-related DEGs, PCA analysis), and analyzed the survival of patients between the high and low ATscore groups.

Statistical analysis
The wilcox test was used to compare the differences between the two groups, and the Kruskal-Wallis test was used to compare the differences between three groups and above. By utilizing the "Surv-Cutpoint" function in the R package "Survminer", the patients were divided into high ATscore group and low ATscore group. Tumor mutation burden (TMB) is the total number of gene coding errors, base substitutions, gene insertion or deletion errors per million bases in tumor cells. The TMB was also in the same method [27]. The survival time of the patient was evaluated by KM (Kaplan-Meier) survival analysis, and the different groups were compared by using a log-rank test. We used the "Pearson" method to calculate correlations coefficients between ARHGs, and P-value less than 0.0001 was thought meaningful. The hazard ratios (HR) for ARHGs and DEGs related to prognosis were calculated by univariate cox analysis. R package "Corrplot" was performed to visualize the relationships which were calculated by the "corr.mtest" function between Autophagy-score and immune cells. The relationship coefficient between TMB and Autophagy-score was obtained by the "Spearman" method. We used R package "RCircos" [28] to display the copy number variation of ARHGs in TCGA-LUAD and TCGA-LUSC on 23 human chromosomes and used "Waterfull" function in R package "maftools" to show the mutation status in TCGA-LUAD and TCGA-LUSC, as well as the mutation status between high and low ATscore groups. The calculation of all data was done through R (version 4.0.4) [29]. The R packages used in this study and their usage methods could be obtained from "bioconduction" (http://bioconductor.org/packages/release/ BiocViews.html#___Software).

Mutations of ARGs in LUAD and LUSC
In order to know the central role of ARGs, we screened 44 ARHGs from 232 ARGs, and the specifics would be shown in Fig 1A. First, we assessed the frequency of total CNV of ARHGs in TCGA data set. It was found that the CNV changes of ARHGs were universal, most of the genes were in the increase of copy number, of which the MYC gene copy number increase was the most active, while a few genes were in the state of copy number loss, and the ATG16L1 gene copy number loss was the most (Fig 1B). Secondly, we performed somatic mutation analysis on TCGA-LUAD and TCGA-LUSC respectively. 350 samples out of 561 LUAD samples were mutated, with a mutation rate of 62.39%. Among them, TP53 had the highest mutation frequency, followed by EGFR, while ATG101 and other genes had no mutations (Fig 2A). In LUSC, 409 out of 491 samples were mutated, and the mutation frequency was 83.3%. Interestingly, the highest mutation rate was also TP53. The second highest was RB1 and no mutations in ATG5 etc. (Fig 2B). Correlation analysis show that most ARGHs were positively correlated, and a few were negatively correlated such as EIF4EBP1 and MAP1LC3C, EIF4EBP1 and CFLAR (S1A Fig). Further analysis of the genes with the highest mutation frequency showed that the expression levels of most genes in the TP53 mutant group were higher than those in the wild group, such as TP53, EGFR, and MYC. Some genes were expressed more highly in the wild group, such as MAP1LC3B, CFLAR and FOS (Fig 2C) [30]. In order to determine whether the above-mentioned mutations affected the expression of ARHGs in patients who were LUSC or LUAD, we compared the expression levels of ARHGs in normal people and patients with LUAD or LUSC in the TCGA database. Compared with normal tissues, most ARHGs are highly expressed in lung cancer tissues. Interestingly, we found that ARHGs with active copy number increase (e.g. MYC, EIF4EBP1, EGFR) and ARHGs with active copy number loss (e.g. ATG16L1, MTOR, ULK1) also showed high expression in lung cancer tissues ( Fig 1C). Therefore, we speculated that the change of CNV was an element that led to the disorder of ARHGs expression in tumor tissues. Through the above analysis, ARHGs are highly heterogeneous in normal tissues and lung cancer tissues. Through the above analysis, ARHGs had a high degree of heterogeneity in normal tissues and lung cancer tissues, which indicated that the expression imbalance of ARGs played a very important role in the development of lung cancer [31].

The unsupervised clustering based on ARHGs
LUAD and LUSC from the TCGA database and two cohorts (GSE41271, GSE42127) from the GEO database were formed into a new cohort (1419 lung cancer patients) which we named merger cohort. We intersected the genes in the merger cohort with 44 ARHGs, and finally got 38 ARHGs [32].
According to the expression of 38 ARHGs, we used unsupervised clustering method to classify tumor samples. The full name of the CDF value was cumulative distribution function value which was the integral of the probability density function. K was the number of clusters. The consistent cumulative distribution function (CDF) graph showed the cumulative distribution function when k took different values. It was used to determine when k took a value, the CDF reached an approximate maximum value, and the cluster analysis result at this time was the most reliable. That was, considering the value of k with a small slope of CDF. Finally, we found that when K was set to 3, the best CDF value was obtained. The tumor samples were split into 3 different groups (Fig 3A), and were termed ATcluster A (573 samples), ATcluster B (469 samples), ATcluster C (377 samples) respectively. There were significant differences between these three different clusters (Fig 4B). Survival analysis was performed on these three clusters, and it was found that ATcluster C had a more survival advantage (Fig 3B). It can be seen from the heat map that the expression levels of ARHGs were dissimilar between different clusters and the expression levels in ATcluster B were lower than those in the other two clusters. We also found that in male patients with LUSC, most of the deaths were concentrated in ATcluster A, rather than in ATclusters B and ATcluster C (Fig 3C) [33].

Differences in function and immune infiltration between ATclusters
Based on the merger cohort, we used GSVA to explore the potential role of these ARHGs, and pathways with P-value less than 0.05 would be considered meaningful [34]. It could be clearly seen that the functions of three ATclusters were different. ATcluster A was actively involved in pathways related to cell replication and repair, such as DNA replication, cell cycle, mismatch repair, nucleotide excision repair and RNA degradation. While cluster C was significantly enriched in tumor suppressor and immune pathways, including TGF-β signaling pathway, Fc epsilon RI signaling pathway, B cell receptor signaling pathway, complement and coagulation cascades and phosphatidylinositol signaling system. ATcluster B mainly focused on biological metabolic pathways such as sulfur metabolism, pentose and glucuronate interconversions, porphyrin and chlorophyll metabolism and pyrimidine metabolism (Fig 5A-5C). We subsequently performed GSEA analysis on these three clusters, and found that the infiltration of immune cells between clusters was very different. ATcluster C contained abundant immune cell infiltration, ATcluster A was the least, and ATcluster B lay between the two. While activated CD4 T cell and CD56bright natural killer cell were the opposite (Fig 4A). Surprisingly, patients in ATcluster C showed a corresponding survival advantage (Fig 3B). Combined with the GSVA analysis results, we speculated that the infiltration of immune cells might play an important role in antitumor [26].

Identification of DEGs related to autophagy and geneClusters
In order to identify potential DEGs related to autophagy, we performed a differential analysis on the merger cohort based on three ATsclusters. As showed in Fig 4C, a total of 2996 DEGs related to autophagy were selected (adjusted P-value < 0.001). To explore the potential functions of these DEGs, GO and KEGG enrichment analysis were used. We obtained a total of

PLOS ONE
Construction of signature by autophagy-related genes 1369 GO terms (P-value <0.05, q-value <0.05) and 43 KEGG pathways (P-value <0.05, qvalue <0.05). Sorting according to the number of genes from high to low, we selected the top 30 GO terms which included 10 BP (biological process) terms, 10 CC (cellular component) terms and 10 MF (molecular function) terms and 30 KEGG pathways respectively. The results showed that 30 GO terms and 30 KEGG pathways mentioned above were mainly concentrated in immune response−activating cell surface receptor signaling pathway, negative regulation of immune system process, neutrophil activation involved in immune response, T cell activation, ubiquitin−like protein ligase binding, protein serine/threonine kinase activity, TNF signaling pathway, and B cell receptor signaling pathway (S2A and S2B Fig) [35].
We then performed univariate Cox analysis on these 2996 DEGs related to autophagy, and finally identified 1803 DEGs (P-value <0.05) related to prognosis. In order to further verify the regulatory mechanism of autophagy, we performed unsupervised clustering analysis based on the 1803 genes obtained. Surprisingly, consistent with the previous ARHGs clustering, the patients were also divided into three clusters (Fig 6A). The three clusters were named geneCluster A~C. Survival analysis showed that geneCluster B had the most advantage in survival (Fig 6B). It could be found that female LUSC patients were mainly concentrated in geneCluster A (Fig 6C). Subsequently, based on three geneClusters, we performed a differential analysis of the expression levels of 38 ARHGs in merger cohort with "Limma" R package. As expected, there was a significant difference in the expression of ARHGs between the three geneClusters (S3A Fig).

The relationship between ATscore and traits of each subtype
Due to individual differences, we constructed a scoring system which was termed ATscore to quantify the degree of autophagy in each patient based on 1803 DEGs related to prognosis. Survival analysis revealed that patients with high ATscore were better than those with low ATscore in survival (Fig 7B). The attribute changes of each patient were presented in the Sankey chart (Fig 7A). We also assessed the correlation between immune cells and ATscore (S1B Fig). The Kruskal-Wallis test revealed that the ATscore was significantly different between ATclusters and geneClusters. Among ATclusters, ATcluster A had the lowest ATscore and ATcluster C had the highest score (Fig 7C). Analysis of GSVA and ssGSEA showed that ATcluster A was mainly related to the occurrence and development of tumor cells, and ATcluster C was mainly related to tumor suppression and immunity (Fig 5A-5C). Therefore, we speculated that patients with high ATscore tended to suppress tumors, while those with low ATscore tended to be subjected to tumors. Among geneClusters, geneCluster B had the highest ATscore, and geneCluster A had the lowest ATscore (Fig 7D). Combined with previous survival analysis (Figs 3B and 6B), it also could be seen that patients with higher ATscore had more advantage in survival which was consistent with the result (Fig 7B).
For the purpose of exploring the relationship between ATscore and occurrence and development of tumor, we conducted an analysis of the relationship between TMB and ATscore in LUAD and LUSC respectively. In LUAD, through quantitative analysis of TMB, it could be seen that tumors with low ATscore exhibited high TMB (Fig 8C). There was likewise a significant negative correlation between ATscore and TMB (Fig 8D). In order to further test this relationship, we divided LUAD patients into high and low groups based on ATscore. It was found that the degree of gene mutation in the group with low ATscore was greater than that in the group with high ATscore (Fig 8A and 8B). We then analyzed LUSC patients in the same way. Surprisingly, the results were similar to LUAD (Fig 9A-9D). We also explored the impact of TMB and ATscore on survival. In LUAD, patients with high TMB and high ATscore survived longer (S3B and S3C Fig). In LUSC, although it might be affected by the number of samples, we could still see that patients with high TMB had a survival benefit (S3D and S3E Fig).

ATscore was a prognostic biomarker
We evaluated the correlation between ATscore and clinical characteristics (age, gender, stage, pathology). It was found that there were significant differences between clinical characteristics (except age) and ATscores. Between the high-score and low-score groups, there was no difference in age distribution. Patients younger than or equal to 65 years old accounted for 45%, and those older than 65 years old accounted for 55% (S4A Fig). Quantitative analysis showed that the ATscore between patients less than or equal to 65 years old and patients older than 65 years old had no meaning (S4F Fig). The surviving female LUAD patients with stage I~II were mostly concentrated in the high ATscore group, while those who died of stage III~IV male LUSC patients were mostly concentrated in the low ATscore group (S4B-S4E Fig). By quantitative analysis, the ATscore of surviving patients was higher than that of dead patients (S4G Fig). Using the same method to analyze gender, pathology, and stage, we found that female had higher ATscore than male (S4H Fig), patients with LUAD had higher ATscore than patients with LUSC (S4I Fig) and ATscore in patients with stage I~II were higher than those with stage III~IV (S4J Fig).
In order to further test the applicability of the ATscore system, we divided patients with the same clinical trait into high ATscore group and low ATscore group, and analyzed them for survival. To our surprise, in each individual clinical trait, patients with high ATscore had better survival than patients with low ATscore (S5A-S5H Fig). Although the P-value of patients with

PLOS ONE
LUSC who were less than or equal to 65 years old and stage which was III~IV was greater than 0.05, they had a certain guiding effect on the prognosis (S5B-S5F and S5H Fig).
To explore the relationship between the prognosis of LUSC and LUAD and the ATscore, we divided patients with the same clinical characteristic into two groups based on pathology. We first performed survival analysis on them separately. Although their P-value was greater than 0.05 (except for patients with stage I~II), their results indicated that the prognosis of LUAD with the same clinical characteristic was better than that of LUSC (S6A- S6F Fig). As expected, through quantitative analysis, it was found that the ATscore of patients with LUAD with the same clinical characteristic was higher than that of patients with LUSC (S7A- S7F  Fig), which further validated the results obtained before (S4D and S4I Fig).
Next, we conducted univariate and multivariate cox analysis of ATscore and clinical characteristics (age, gender, stage, pathology). Univariate cox analysis showed that age, stage, gender, and pathology were risk factors for patient prognosis, while ATscore was a protective factor. Results of multivariate cox analysis showed that age, stage, and ATscore could be used as independent prognostic factors for evaluating the prognosis of patients (Table 2).
From the validation of patients in GSE37745, we were surprised to find that the results were as expected-patients with high autophagy lived longer (Fig 10I).

Treatment strategy of LUAD and LUSC based on ATscore
We assessed the expression levels of four common immune checkpoints (PD1, PD-L1, PD-L2, and CTLA-4) in LUAD and LUSC, and analyzed differences based on the patients' ATscore. From the results, it could be seen that the immune checkpoint molecules were lowly expressed in patients with high ATscore, and were highly expressed in patients with low ATscore in LUAD (Fig 8E-8H). However, the results obtained were exactly the opposite of LUAD in LUSC, although their P-value was greater than 0.05 (Fig 9E-9H). At the same time, based on the ATscore, we analyzed the IPS score in LUAD and LUSC to predict their immunogenicity. In LUAD, IPS and IPS-CTLA4 scores were higher in patients with high ATscore (Fig 10A-10D). In LUSC, patients who were in the high ATscore group had higher IPS-PD1/PD-L1/ PD-L2, IPS-CTLA4, IPS-PD1/PD-L1/PD-L2-CTLA4 scores (Fig 10E-10H). These results indicated that LUAD patients with high ATscores might respond better to CTLA4 immunotherapy, while LUSC patients with high ATscores might get a better response to CTLA4, PD1/ PDL1/PDL2, CTLA4-PD1/PDL1/PDL2 immunotherapy [36].
Based on the ATscore, we also analyzed the sensitivity of several common chemotherapy drugs (Cisplatin, Gemcitabine, Paclitaxel, Vinorelbine and Methotrexate) in LUAD and LUSC. The results showed that in LUAD, patients with high ATscores were sensitive to these five chemotherapeutics (S8A-S8E Fig). In LUSC, patients with high ATscores were sensitive to cisplatin, paclitaxel, and vinorelbine (S8F- S8J Fig) [37].

Discussion
Autophagy is a process of self-digestion, which plays a great role in maintaining cell stability. It can remove waste and harmful substances in cells, and can inhibit the occurrence of tumors. The tumor suppressor function of autophagy has been widely recognized [38]. However, recent studies have shown that autophagy has a dual effect. It can also provide nutrition for tumor cells to promote the growth of tumor cells, and allow tumor cells to evade immune surveillance which leads to drug resistance [39]. Therefore, in the process of improving the efficacy of anti-tumor drugs and reducing tumor resistance, autophagy is such as to become a key regulatory point. Research on autophagy is expected to improve the survival rate of NSCLC patients.
We conducted research on LUAD and LUSC, which accounted for the largest proportion of NSCLC. Based on the TCGA database, we conducted a preliminary exploration and found that the expression of autophagy in lung cancer was relatively active. We speculated that autophagy played a major role in the occurrence and development of lung cancer. Therefore, we conducted further research on autophagy and established a set of autophagy scoring system. We combined the autophagy score with clinical characteristics, gene expression, immune cell infiltration, and tumor mutation burden. As expected, the autophagy score was significantly correlated with immune cell infiltration, tumor mutation burden, gender, status, pathology, and staging. Survival analysis showed that patients with high autophagy score had an advantage in survival. In order to reduce the interference of other factors, we analyzed patients with the same clinical traits. Surprisingly, the conclusion still holed. Univariate and multivariate COX analysis suggested that autophagy score could be an independent prognostic factor and a protective factor for patient prognosis.
In preliminary exploration of the role of autophagy in LUAD and LUSC, mutation analysis showed that the mutation frequency of TP53 was the highest. And we were surprised to find that the expression of key autophagy genes was different between the TP53 mutant group and the TP53 wild group. In the subsequent high autophagy score group and low autophagy score group, the mutation frequency of TP53 also appeared to be the highest. As we all know, TP53 is a tumor suppressor, and the inactivation of TP53 function is a common feature of human tumors. TP53 is also involved in complex biological processes such as cell DNA repair, apoptosis, and aging [40,41]. Therefore, we could speculate that the mutation of TP53 occupied a very influential position in the development of lung cancer. Among ARGs, TP53 was likely to become an immune regulatory point that improved the survival rate of lung cancer patients and reduced drug resistance.
In the process of analyzing ATclusters, we divided the samples into ATcluster A~C according to the expression of ARHGs. Survival analysis showed that patients in ATcluster C had a longer survival time, followed by patients in ATcluster B, and patients in ATcluster A had shorter survival time. GSVA analysis suggested that ATcluster A was mainly focused on cell replication and repair, ATcluster C was mainly focused on tumor suppression and immunity, and ATcluster B was somewhere in between. Coincidentally, ATcluster A had the lowest autophagy score, ATcluster C had the highest score, and ATcluster B was in between. Combining the relationship between tumor mutation burden and autophagy score, it could be seen that the higher the autophagy score, the higher the probability of tumor mutation, and patients showed anti-tumor and immune processes, and the longer the survival time. On the contrary, the less the probability of tumor mutation, the performance was tumor cell proliferation and repair, and the survival time was shorter. A series of analyses on geneclusters further confirmed this idea.
We compared LUAD patients and LUSC patients with the same clinical traits. From the survival analysis, it could be seen that the survival time of LUAD patients was longer than that of LUSC patients, and the autophagy score of LUAD was higher than that of LUSC. We could conclude that the lower the autophagy score, the less conducive to survival, and the higher the degree of malignancy of the tumor. In LUAD, although the expression level of CTLA-4 in patients with high autophagy score was higher than that in patients with low autophagy score, immunogenicity analysis suggested that the immunogenicity of CTLA-4 in patients with high autophagy score was higher than that of patients with low autophagy score. It can be speculated that LUAD patients with high autophagy scores were more suitable for CTLA-4 immunotherapy. In LUSC, the expression level of immune checkpoints in patients with high autophagy scores was greater than that in patients with low autophagy scores, and the immunogenicity of CTLA-4/PD1 in patients with high autophagy scores was higher than that of patients with low autophagy scores too. It could be seen that LUSC patients with high autophagy score were suitable for CTLA-4, PD1 or combined immunotherapy. Sensitivity analysis of common chemotherapy drugs proved that patients with high autophagy score were more sensitive. The above analysis confirmed that the prognosis of patients with high autophagy score was better from another angle [42]. Survival analysis of patients between high and low ATscore in GSE37745 further confirmed this model.
Although the autophagy score has a good predictive and prognostic guiding role for LUAD and LUSC patients, there are still some limitations that needed to be resolved. First of all, these samples are collected from public databases, and there may be selection bias. Secondly, the transcription of the genes in the database comes from tumor tissues (including the tumor itself and the tumor microenvironment), so it is impossible to distinguish where the signature (ARG and ARHG) in this research mainly comes from. Thirdly, infiltration of immune cells, tumor mutational burden, effects of immunotherapy and commonly used chemotherapeutics are only inferred by computer calculations, and clinical verification is lacking. Studies have shown that autophagy has its duality in lung cancer. It can inhibit cancer and promote cancer. In this article, the autophagy score only roughly reflects the side of autophagy inhibiting cancer, and does not reflect the side of excessive autophagy promoting cancer. Therefore, further experiments and clinical data are needed to confirm and perfect this finding [43].